Regularization as a Toolkit for Parsimonious Modeling in Bioinformatics
نویسنده
چکیده
With the tremendous progress in bioinformatics we are facing new challenges where complex data set need to be analyzed to glean meaningful scientific knowledge. Though statistical tools were slow to emerge in the beginning, however it was discovered early that there exist some seemingly related issues in many of the different bioinformatics data sets. They are generally overly complex giving rise to complicated models which are often sub-optimal and have poor generalization capability outside the realm of respective study. This violates scientific principles of causality. The balance between the model complexity and its generalizing capacity is often sought after in many of the early successful modeling efforts in bioinformatics. Over the time this has been termed as “Regularization”. This review article investigates regularization as a solution for parsimonious model selection in complex biological data and can be seen as an attempt to gather bits and pieces of many such earlier efforts under a single umbrella.
منابع مشابه
A parsimonious threshold-independent protein feature selection method through the area under receiver operating characteristic curve
MOTIVATION Protein expression profiling for differences indicative of early cancer holds promise for improving diagnostics. Due to their high dimensionality, statistical analysis of proteomic data from mass spectrometers is challenging in many aspects such as dimension reduction, feature subset selection as well as construction of classification rules. Search of an optimal feature subset, commo...
متن کاملORBIT - Operating-Regime-Based Modeling and Identi cation Toolkit
ORBIT is a MATLAB-based toolkit for black-box and grey-box modeling of non-linear dynamic systems. The model representation is based on multiple local models valid in di erent operating regimes, that are smoothly blended into a global non-linear model. ORBIT is a computeraided modeling environment that supports the interactive development of regime-based models on the basis of a mixture of empi...
متن کاملOrbit - Operating Regime Based Modeling and Identification Toolkit
ORBIT is a MATLAB-based toolkit for black-box and grey-box modeling of non-linear dynamic systems. The model representation is based on multiple local models valid in di erent operating regimes that are being smoothly patched together into a global non-linear model. ORBIT is a computer-aided modeling environment that supports interactive development of regime based models on the basis of a mixt...
متن کامل3D Inversion of Magnetic Data through Wavelet based Regularization Method
This study deals with the 3D recovering of magnetic susceptibility model by incorporating the sparsity-based constraints in the inversion algorithm. For this purpose, the area under prospect was divided into a large number of rectangular prisms in a mesh with unknown susceptibilities. Tikhonov cost functions with two sparsity functions were used to recover the smooth parts as well as the sharp ...
متن کاملRegularized ROC method for disease classification and biomarker selection with microarray data
MOTIVATION An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease classification. Thus there is a need for developing statistical methods that can efficiently use such high-throughput genomic data, select biomarkers with discriminant power and construct classification rules. The ROC (receiver operator characteristic) tech...
متن کامل